import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import altair as alt
from sklearn.cluster import KMeans
import re
from wordcloud import WordCloud
# Show all columns in dataframes
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)
np.seterr(invalid="ignore");Yelp Data Exploration
This page explores the step-by-step of how we acquired, formatted, and developed the neighborhood amenity guide for Philadelphia. Starting with uploading all of our acquired Yelp data using Yelp Fusion API and matching them to their respective census tract.
Gathering Census Tracts
# phila bounds
phl_bound = gpd.read_file("data/Yelp API/phl_bound.geojson")
phl_neigh = gpd.read_file("https://raw.githubusercontent.com/opendataphilly/open-geo-data/master/Neighborhoods_Philadelphia/Neighborhoods_Philadelphia.geojson")
phl_tract = gpd.read_file("data/Yelp API/census-tracts.geojson")phl_bound_proj = phl_bound.to_crs(2272)
phl_neigh_proj = phl_neigh.to_crs(2272)
phl_tract_proj = phl_tract.to_crs(2272)neigh = phl_neigh_proj[["mapname", "geometry"]]# Copying the DataFrame
tract = phl_tract_proj.copy()
# Concatenating 'neighborhood_name' and 'tract_id' into a new column 'nb_name'
tract['nb_name'] = tract['neighborhood_name'] + " " + tract['tract_id'].astype(str)tract = tract.copy()
# Calculate area in square meters
tract['area_m2'] = tract['geometry'].area
# Convert area to square miles (1 square mile = 2,589,988.11 square meters)
tract['area_mi2'] = tract['area_m2'] / 27878400tract = tract[["nb_name", "area_mi2", "geometry"]]Gathering All Amenities
Our amentities had broad and specific attributes that required classifications for what a potential home-buyer or tourist would seek. Some places would fall into multiple categories, such as bars acting as both entertainment and nightlife.
# amenities
restaurants = gpd.read_file("data/Yelp API/restaurants.geojson")
parks = gpd.read_file("data/Yelp API/parks.geojson")
education = gpd.read_file("data/Yelp API/education.geojson")
grocery = gpd.read_file("data/Yelp API/grocery.geojson")
entertainment = gpd.read_file("data/Yelp API/entertainment.geojson")
nightlife = gpd.read_file("data/Yelp API/nightlife.geojson")
shopping = gpd.read_file("data/Yelp API/shopping.geojson")
healthcare = gpd.read_file("data/Yelp API/healthcare.geojson")
historic = gpd.read_file("data/Yelp API/historic_landmarks.geojson")
kids = gpd.read_file("data/Yelp API/kids.geojson")
arts = gpd.read_file("data/Yelp API/perf_arts.geojson")
beauty = gpd.read_file("data/Yelp API/beauty.geojson")Data Cleaning
Each category mentioned above includes sub-categories that accurately depict the attributes of the place of interest. The importance of each “alias” is to ensure each place is reflected by all their features. This can result in overlap is amenity counts, but that is acceptable because it increases accuracy in neighborhood reflections and due to businesses often providing more than one good or service.
Restaurants
# Filter and create a copy to avoid SettingWithCopyWarning
food_desc = restaurants.copy()
# Create a new column 'desc_1'
food_desc["desc_1"] = food_desc['alias'].str.split(',').str[0].str.strip().str.lower()
food_desc["desc_2"] = food_desc['alias'].str.split(',').str[1].str.strip().str.lower()
food_desc["desc_3"] = food_desc['alias'].str.split(',').str[2].str.strip().str.lower()First word in “alias”
# Group by 'desc_1' and count, then convert to DataFrame
food_desc.groupby("desc_1").size().reset_index(name='count').sort_values("count", ascending = False).head(10)| desc_1 | count | |
|---|---|---|
| 108 | pizza | 320 |
| 36 | chinese | 198 |
| 118 | restaurants | 182 |
| 45 | delis | 161 |
| 124 | seafood | 101 |
| 17 | breakfast_brunch | 88 |
| 77 | hotdogs | 84 |
| 100 | newamerican | 81 |
| 122 | sandwiches | 77 |
| 58 | fooddeliveryservices | 75 |
Second word in “alias”
# Group by 'desc_2' and count, then convert to DataFrame
food_desc.groupby("desc_2").size().reset_index(name='count').sort_values("count", ascending = False).head(10)| desc_2 | count | |
|---|---|---|
| 111 | sandwiches | 120 |
| 12 | breakfast_brunch | 86 |
| 64 | hotdogs | 79 |
| 17 | burgers | 78 |
| 112 | seafood | 73 |
| 28 | chicken_wings | 66 |
| 131 | tradamerican | 65 |
| 72 | italian | 44 |
| 98 | pizza | 42 |
| 27 | cheesesteaks | 41 |
Third word in “alias”
# Group by 'desc_3' and count, then convert to DataFrame
food_desc.groupby("desc_3").size().reset_index(name='count').sort_values("count", ascending = False).head(10)| desc_3 | count | |
|---|---|---|
| 83 | sandwiches | 124 |
| 97 | tradamerican | 61 |
| 20 | chicken_wings | 41 |
| 50 | hotdogs | 39 |
| 10 | breakfast_brunch | 38 |
| 13 | burgers | 33 |
| 25 | coffee | 30 |
| 31 | delis | 26 |
| 84 | seafood | 26 |
| 49 | hotdog | 23 |
Replace description 1 with description 2 when description 1 is irrelevant
Words like “restaurant” and “fooddeliveryservices” don’t tell us what kind of food is served. Let’s replace the description in these cases with something more descriptive.
In these cases we can search for restaurants where the first word isn’t descriptive, then replace the desc_1 with desc_2.
# replace desc_2
condition1 = food_desc['desc_2'].isin(["food trucks", "foodtrucks", "fooddeliveryservices", "delis", "bars", "grocery", "pubs", "catering", "convenience",
"sportsbars", "popuprestaurants", "restaurants", "foodstands", "comfortfood", "food_court", "lounges", "breweries",
"wine_bars", "bowling", "meats", "newamerican", "tradamerican", "american", "buffets", "foodtrucks"])
condition2 = food_desc['desc_3'].notna()
# Update desc_1 where conditions are met
food_desc.loc[condition1 & condition2, 'desc_2'] = food_desc['desc_3']
# replace desc_1
condition1 = food_desc['desc_1'].isin(["food trucks", "foodtrucks", "fooddeliveryservices", "delis", "bars", "grocery", "pubs", "catering", "convenience",
"sportsbars", "popuprestaurants", "restaurants", "foodstands", "comfortfood", "food_court", "lounges", "breweries",
"wine_bars", "bowling", "meats", "newamerican", "tradamerican", "american", "buffets", "foodtrucks"])
condition2 = food_desc['desc_2'].notna()
# Update desc_1 where conditions are met
food_desc.loc[condition1 & condition2, 'desc_1'] = food_desc['desc_2']
food_desc.groupby("desc_1").size().reset_index(name='count').sort_values("count", ascending = False).head()| desc_1 | count | |
|---|---|---|
| 111 | pizza | 339 |
| 37 | chinese | 203 |
| 124 | restaurants | 182 |
| 128 | sandwiches | 150 |
| 130 | seafood | 126 |
Unfortunately this didn’t replace “restaurant”. Restaurants with only “restaurants” as the first word in their alias usually don’t have other descriptive words. This shows that “restaurant” is the default. This calls for another approach.
food_desc[food_desc["desc_1"] == "restaurants"].head(5)| name | rating | address | alias | title | geometry | desc_1 | desc_2 | desc_3 | |
|---|---|---|---|---|---|---|---|---|---|
| 8 | Smitty's Restaurant | 0.0 | 5405 Chester Ave | restaurants | Restaurants | POINT (2676904.285 231029.670) | restaurants | NaN | NaN |
| 10 | Kan Yuen Ping | 0.0 | 5428 Chester Ave | restaurants | Restaurants | POINT (2676791.167 230669.291) | restaurants | NaN | NaN |
| 11 | Silver Place | 0.0 | 1500 S 58th St | restaurants | Restaurants | POINT (2673976.157 231063.740) | restaurants | NaN | NaN |
| 14 | Yuan East Restaurant | 0.0 | 1832 S 58th St | restaurants | Restaurants | POINT (2675345.361 229583.321) | restaurants | NaN | NaN |
| 17 | Wei Hua | 0.0 | 5140 Chester Ave | restaurants | Restaurants | POINT (2677863.415 231724.048) | restaurants | NaN | NaN |
By inspecting the names of restaurants lacking descriptions, we can parse out some common categories.
restaurant_cloud = food_desc[food_desc["desc_1"] == "restaurants"]
# Concatenate all text in the column
text = ' '.join(restaurant_cloud['name'].dropna())
# Create the word cloud
wordcloud = WordCloud(width=800, height=400, background_color ='white').generate(text)
# Display the word cloud using matplotlib
plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.show()
# Define the mapping
keyword_mapping = {
'yuen': 'chinese',
'ping': 'chinese',
'zhong': 'chinese',
'china': 'chinese',
'canton': 'chinese',
'wok': 'chinese',
'lee': 'chinese',
'china': 'chinese',
'zhang': 'chinese',
'chung': 'chinese',
'dragon': 'chinese',
'meng': 'chinese',
'xin': 'chinese',
'hua': 'chinese',
'ping': 'chinese',
'hua': 'chinese',
'meng': 'chinese',
'yi': 'chinese',
'xi': 'chinese',
'lam': 'chinese',
'zhang': 'chinese',
'wei': 'chinese',
'salad': 'salad',
'pizza': 'pizza',
'grill': 'grill',
"steak": "grill",
'sushi': 'japanese',
'chicken': 'chicken',
'pizzeria': 'pizza',
'pizza': 'pizza',
'bbg': 'bbq',
'hoagie': 'sandwich',
'steaks': 'sandwich',
'coffee': 'cafe',
'cafe': 'cafe',
'diner': 'american',
'luncheonette': 'american',
'wings': 'wings',
'deli': 'deli',
'mexican': 'latin',
'guadalupana': 'latin',
'guatamelteco': 'latin',
'fish': 'seafood',
'casa': 'latin'}
# Function to find the keyword and return the corresponding desc_2 value
def map_keyword_to_desc(name, mapping):
for keyword, desc in mapping.items():
if re.search(re.escape(keyword), name, re.IGNORECASE):
return desc
return None
# Define the list of specific values
specific_values = ["food trucks", "fooddeliveryservices", "bars", "grocery", "restaurants"]
# Define the combined condition: desc_2 is NA or in the list of specific values
condition_combined = food_desc['desc_2'].isna() | food_desc['desc_1'].isin(specific_values)
# Apply the function to update desc_2 based on the combined condition
food_desc.loc[condition_combined, 'desc_1'] = food_desc.loc[condition_combined, 'name'].apply(lambda x: map_keyword_to_desc(x, keyword_mapping))Reformat into larger categories
# Define the mapping dictionary
desc_mapping = {
"burgers": "sandwiches",
"hotdogs": "sandwiches",
"chicken_wings": "chicken",
"tradamerican": "american",
"newamerican": "american",
"wings": "chicken",
"hotdog": "diner/grill",
"gastropubs": "diner/grill",
"diners": "diner/grill",
"steak": "diner/grill",
"grills": "diner/grill",
"grill": "diner/grill",
"coffee": "bakeries/cafes",
'cafe': "bakeries/cafes",
"cafes": "bakeries/cafes",
'bagels': "bakeries/cafes",
'wraps': "bakeries/cafes",
'soups': "bakeries/cafes",
'soup': "bakeries/cafes",
"cupcakes": "bakeries/cafes",
"creperies": "bakeries/cafes",
"bakeries": "bakeries/cafes",
"juicebars": "bakeries/cafes",
'donuts': "bakeries/cafes",
'icecream': "bakeries/cafes",
"ice cream": 'bakeries/cafes',
"desserts": "bakeries/cafes",
"waffles": "bakeries/cafes",
"cakeshop": "bakeries/cafes",
'customcakes': 'bakeries/cafes',
"cheesesteaks": "sandwiches/delis",
"sandwich": "sandwiches/delis",
"sandwiches": "sandwiches/delis",
"delis": "sandwiches/delis",
"deli": "sandwiches/delis",
"chickenshop": "chicken",
"sushi": "japanese",
"ramen": "japanese",
'hotpot': 'chinese',
"szechuan": "chinese",
"cantonese": "chinese",
"dimsum": "chinese",
"shanghainese": "chinese",
"pretzels": "bakeries/cafes",
"mexican": "latin",
"honduran": "latin",
"nicaraguan": "latin",
"spanish": "latin",
"guatemalan": "latin",
"tacos": "latin",
"brazilian": "latin",
"portuguese": "latin",
"dominican": "caribbean",
"haitian": "caribbean",
"seafoodmarket,": "seafood",
"seafoodmarkets,": "seafood",
"greek": "mediterranean",
"mideastern": "halal"
}
# Replace desc_1 values using the mapping dictionary
food_desc['desc_1'] = food_desc['desc_1'].replace(desc_mapping)food_desc.groupby("desc_1").size().reset_index(name='count').sort_values("count", ascending = False).head()| desc_1 | count | |
|---|---|---|
| 70 | pizza | 326 |
| 80 | sandwiches/delis | 241 |
| 8 | bakeries/cafes | 216 |
| 79 | sandwiches | 172 |
| 25 | chinese | 159 |
Identify fast food/chain restaurants
Override desc_1 with “fastfood” for major chain restaurants.
food_desc.groupby("name").size().sort_values(ascending = False).head(10)name
Wawa 28
McDonald's 25
Popeyes Louisiana Kitchen 19
Wendy's 17
Philly Pretzel Factory 13
KFC 13
Subway 10
Wingstop 9
Taco Bell 9
Crown Fried Chicken 8
dtype: int64
fast_food = [
"McDonald's",
"Wawa",
"Wingstop",
"MrBeast Burger",
"Chick-fil-A",
"Chipotle Mexican Grill",
"Checkers",
"Wendy's",
"Subway",
"Popeyes Louisiana Kitchen",
"Taco Bell",
"KFC",
"Burger King",
]food_desc.loc[food_desc["name"].isin(fast_food), "desc_1"] = "fastfood"Take the descriptions with frequency < 6 and replace the string with the second term in alias
desc_counts = food_desc.groupby("desc_1").size().reset_index(name='count').sort_values("count", ascending = False)
# Filter for 'desc_1' values where count is less than 6
desc_less_than_six = desc_counts[desc_counts['count'] < 6]['desc_1']
# Convert the filtered results to a list
desc_list = desc_less_than_six.tolist()
# replace desc_1 with desc_2 where desc_1 is in the list of descriptions with frequency < 6
food_desc.loc[food_desc['desc_1'].isin(desc_list), 'desc_1'] = food_desc['desc_2']food_cats = food_desc.groupby("desc_1").size().reset_index(name='count').sort_values("count", ascending = False)
top_food_cats = list(food_cats[food_cats["count"] >= 12]["desc_1"])
food_desc = food_desc[food_desc["desc_1"].isin(top_food_cats)]food_desc.groupby("desc_1").size().reset_index(name='count').sort_values("count", ascending = False)
food_desc.head(5)| desc_1 | count | |
|---|---|---|
| 18 | pizza | 327 |
| 21 | sandwiches/delis | 220 |
| 2 | bakeries/cafes | 214 |
| 10 | fastfood | 162 |
| 7 | chinese | 159 |
| 22 | seafood | 92 |
| 9 | diner/grill | 78 |
| 20 | sandwiches | 77 |
| 4 | breakfast_brunch | 76 |
| 6 | chicken | 67 |
| 16 | latin | 63 |
| 14 | japanese | 54 |
| 13 | italian | 42 |
| 0 | american | 38 |
| 19 | salad | 37 |
| 23 | soulfood | 37 |
| 11 | halal | 29 |
| 5 | caribbean | 25 |
| 17 | mediterranean | 23 |
| 3 | bbq | 17 |
| 8 | cocktailbars | 15 |
| 12 | indpak | 13 |
| 24 | venues | 13 |
| 25 | vietnamese | 13 |
| 1 | asianfusion | 12 |
| 15 | korean | 12 |
Parks
# Filter and create a copy to avoid SettingWithCopyWarning
parks_desc = parks.copy()
# Create a new column 'desc_1'
parks_desc["desc_1"] = parks_desc['alias'].str.split(',').str[0].str.strip().str.lower()
parks_desc["desc_2"] = parks_desc['alias'].str.split(',').str[1].str.strip().str.lower()
parks_desc["desc_3"] = parks_desc['alias'].str.split(',').str[2].str.strip().str.lower()First word in ‘alias’
# Group by 'desc_1' and count, then convert to DataFrame
parks_desc.groupby("desc_1").size().reset_index(name='count').sort_values("count", ascending = False).head(5)| desc_1 | count | |
|---|---|---|
| 19 | parks | 150 |
| 15 | landmarks | 36 |
| 21 | playgrounds | 25 |
| 7 | dog_parks | 23 |
| 13 | hiking | 17 |
| 24 | recreation | 9 |
| 12 | gardens | 7 |
| 25 | skate_parks | 6 |
| 28 | tennis | 4 |
| 11 | football | 4 |
| 3 | basketballcourts | 2 |
| 6 | discgolf | 2 |
| 14 | kids_activities | 2 |
| 17 | museums | 2 |
| 18 | nonprofit | 2 |
| 23 | rafting | 2 |
| 20 | pets | 1 |
| 26 | stadiumsarenas | 1 |
| 22 | publicart | 1 |
| 27 | swimmingpools | 1 |
| 0 | amateursportsteams | 1 |
| 16 | localflavor | 1 |
| 1 | bars | 1 |
| 10 | fishing | 1 |
| 9 | farms | 1 |
| 8 | farmersmarket | 1 |
| 5 | climbing | 1 |
| 4 | bikes | 1 |
| 2 | baseballfields | 1 |
| 29 | theater | 1 |
Filter for categories where frequency ≥ 5
park_desc_1 = parks_desc.groupby("desc_1").size().reset_index(name='count').sort_values("count", ascending = False).head(9)
top_park_desc = list(park_desc_1["desc_1"])
# remove historical landmarks
top_park_desc.remove('landmarks')
# filter for obs in the list of top descriptions
parks_desc = parks_desc[parks_desc["desc_1"].isin(top_park_desc)]
parks_desc.head(10)| name | rating | address | alias | title | geometry | desc_1 | desc_2 | desc_3 | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | Cedar Park | 4.5 | 50th St & Baltimore Ave | parks | Parks | POINT (2677006.643 234133.514) | parks | NaN | NaN |
| 1 | Chester Avenue Dog Park | 3.5 | 801 S 48th St | dog_parks | Dog Parks | POINT (2678024.964 234174.704) | dog_parks | NaN | NaN |
| 2 | Malcolm X Park | 4.0 | 51ST And Pine St | parks | Parks | POINT (2676187.013 235721.062) | parks | NaN | NaN |
| 3 | Barkan Park | 3.0 | 4936 Spruce St | parks, playgrounds | Parks, Playgrounds | POINT (2677109.530 236278.545) | parks | playgrounds | NaN |
| 4 | Clark Park Dog Bowl | 4.5 | 43rd & Chester | dog_parks | Dog Parks | POINT (2680815.785 234312.734) | dog_parks | NaN | NaN |
| 5 | Clark Park | 4.5 | 43RD And Baltimore | parks | Parks | POINT (2680756.007 234729.865) | parks | NaN | NaN |
| 6 | Greys Ferry Cresent Skatepark | 5.0 | 3600 Grays Ferry Ave | skate_parks | Skate Parks | POINT (2683004.637 231400.562) | skate_parks | NaN | NaN |
| 7 | Grays Ferry Crescent | 4.0 | parks | Parks | POINT (2682813.782 232182.835) | parks | NaN | NaN | |
| 8 | Lanier Dog Park | 4.0 | 2911 Tasker St | dog_parks | Dog Parks | POINT (2685221.027 229056.524) | dog_parks | NaN | NaN |
| 9 | Saunder's Green | 3.5 | 300-50 Saunders Ave | parks | Parks | POINT (2683400.920 238616.685) | parks | NaN | NaN |
Education
# Filter and create a copy to avoid SettingWithCopyWarning
edu_desc = education.copy()
# Create a new column 'desc_1'
edu_desc["desc_1"] = edu_desc['alias'].str.split(',').str[0].str.strip().str.lower()
edu_desc["desc_2"] = edu_desc['alias'].str.split(',').str[1].str.strip().str.lower()
edu_desc["desc_3"] = edu_desc['alias'].str.split(',').str[2].str.strip().str.lower()First word in alias
# Group by 'desc_1' and count, then convert to DataFrame
edu_desc.groupby("desc_1").size().reset_index(name='count').sort_values("count", ascending = False).head(5)| desc_1 | count | |
|---|---|---|
| 14 | childcare | 91 |
| 29 | elementaryschools | 86 |
| 66 | preschools | 62 |
| 17 | collegeuniv | 55 |
| 43 | highschools | 33 |
| 26 | driving_schools | 31 |
| 28 | educationservices | 25 |
| 75 | specialtyschools | 23 |
| 81 | theater | 17 |
| 6 | artschools | 15 |
| 19 | cosmetology_schools | 14 |
| 83 | tutoring | 13 |
| 3 | artclasses | 13 |
| 27 | education | 12 |
| 22 | dance_schools | 10 |
| 86 | vocation | 9 |
| 23 | dancestudio | 8 |
| 68 | privatetutors | 7 |
| 49 | montessori | 7 |
| 20 | cprclasses | 7 |
Filter for frequency ≥ 6
edu_desc_1 = edu_desc.groupby("desc_1").size().reset_index(name='count').sort_values("count", ascending = False).head(18)
top_edu_desc = list(edu_desc_1["desc_1"])
# remove historical landmarks
top_edu_desc.remove('theater')
# filter for obs in the list of top descriptions
edu_desc_1 = edu_desc[edu_desc["desc_1"].isin(top_edu_desc)]
edu_desc.head(10)| name | rating | address | alias | title | geometry | desc_1 | desc_2 | desc_3 | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | BellyWise | 5.0 | NaN | midwives, specialtyschools | Midwives, Specialty Schools | POINT (2676169.939 231988.984) | midwives | specialtyschools | NaN |
| 1 | Limitless CPR | 5.0 | NaN | cprclasses | CPR Classes | POINT (2677042.522 231394.414) | cprclasses | NaN | NaN |
| 2 | Michele Judge | 5.0 | 4919 Pentridge St | jewelryrepair, artclasses, jewelry | Jewelry Repair, Art Classes, Jewelry | POINT (2677304.635 233898.273) | jewelryrepair | artclasses | jewelry |
| 3 | Kipp West Philadelphia Prep Charter | 1.0 | 5900 Baltimore Ave | specialed | Special Education | POINT (2671666.162 232636.864) | specialed | NaN | NaN |
| 4 | Beulah Baptist Christian Day School | 2.0 | 5001 Spruce St | elementaryschools, preschools | Elementary Schools, Preschools | POINT (2677065.563 236331.076) | elementaryschools | preschools | NaN |
| 5 | The Academy of Industrial Arts | 5.0 | 6328 Paschall Ave | vocation, electricians | Vocational & Technical School, Electricians | POINT (2674851.743 225884.662) | vocation | electricians | NaN |
| 6 | TYLII | 3.0 | 5124 Walnut St | preschools, childcare | Preschools, Child Care & Day Care | POINT (2676415.568 237130.841) | preschools | childcare | NaN |
| 7 | University of the Sciences | 3.0 | 600 S 43rd St | collegeuniv | Colleges & Universities | POINT (2681062.408 233555.127) | collegeuniv | NaN | NaN |
| 8 | All Around This World | 5.0 | 4336 Pine St | educationservices | Educational Services | POINT (2680347.395 235381.376) | educationservices | NaN | NaN |
| 9 | Jubilee School | 3.0 | 4211 Chester Ave | highschools | Middle Schools & High Schools | POINT (2681043.809 234475.721) | highschools | NaN | NaN |
Grocery
# Filter and create a copy to avoid SettingWithCopyWarning
grocery_desc = grocery.copy()
# Create a new column 'desc_1'
grocery_desc["desc_1"] = grocery_desc['alias'].str.split(',').str[0].str.strip().str.lower()
grocery_desc["desc_2"] = grocery_desc['alias'].str.split(',').str[1].str.strip().str.lower()
grocery_desc["desc_3"] = grocery_desc['alias'].str.split(',').str[2].str.strip().str.lower()grocery_desc.head()| name | rating | address | alias | title | geometry | desc_1 | desc_2 | desc_3 | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | Kim A Grocery & Deli | 1.0 | 1840 S 58th St | grocery | Grocery | POINT (2675396.135 229529.164) | grocery | NaN | NaN |
| 1 | S and J Seafood | 4.0 | 713 S 52nd St | seafoodmarkets, grocery | Seafood Markets, Grocery | POINT (2675793.250 234207.384) | seafoodmarkets | grocery | NaN |
| 2 | Mariposa Food Co-op | 4.0 | 4824 Baltimore Ave | grocery, healthmarkets | Grocery, Health Markets | POINT (2677563.153 234052.277) | grocery | healthmarkets | NaN |
| 3 | Jennifer Grocery | 2.5 | 4824 Chester Ave | grocery, convenience | Grocery, Convenience Stores | POINT (2678754.671 232771.760) | grocery | convenience | NaN |
| 4 | Fu-Wah Mini Market | 4.5 | 810 S 47th St | convenience, sandwiches, grocery | Convenience Stores, Sandwiches, Grocery | POINT (2678571.989 234207.924) | convenience | sandwiches | grocery |
Nightlife
# Filter and create a copy to avoid SettingWithCopyWarning
night_desc = nightlife.copy()
# Create a new column 'desc_1'
night_desc["desc_1"] = night_desc['alias'].str.split(',').str[0].str.strip().str.lower()
night_desc["desc_2"] = night_desc['alias'].str.split(',').str[1].str.strip().str.lower()
night_desc["desc_3"] = night_desc['alias'].str.split(',').str[2].str.strip().str.lower()night_desc.head()| name | rating | address | alias | title | geometry | desc_1 | desc_2 | desc_3 | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | Pentridge Station Beer Garden | 4.5 | 5116 Pentridge St | beergardens | Beer Gardens | POINT (2676581.781 233167.788) | beergardens | NaN | NaN |
| 1 | 720 Bistro | 4.0 | 720 S.52nd st | seafood, breakfast_brunch, cocktailbars | Seafood, Breakfast & Brunch, Cocktail Bars | POINT (2675620.054 234102.746) | seafood | breakfast_brunch | cocktailbars |
| 2 | Bayou | 4.5 | 5025 Baltimore Ave | lounges | Lounges | POINT (2676525.799 234061.840) | lounges | NaN | NaN |
| 3 | Booker's Restaurant and Bar | 3.5 | 5021 Baltimore Ave | bars, breakfast_brunch, tradamerican | Bars, Breakfast & Brunch, American | POINT (2676559.222 234070.085) | bars | breakfast_brunch | tradamerican |
| 4 | The Wine Garden | 4.5 | 5019 Baltimore Ave | wine_bars | Wine Bars | POINT (2676563.551 234060.148) | wine_bars | NaN | NaN |
Alias
# Group by 'desc_1' and count, then convert to DataFrame
night_desc.groupby("desc_1").size().reset_index(name='count').sort_values("count", ascending = False).head(10)| desc_1 | count | |
|---|---|---|
| 6 | bars | 172 |
| 86 | sportsbars | 78 |
| 75 | pubs | 71 |
| 34 | divebars | 53 |
| 61 | lounges | 49 |
| 100 | tradamerican | 49 |
| 67 | newamerican | 47 |
| 23 | cocktailbars | 39 |
| 47 | hookah_bars | 28 |
| 66 | musicvenues | 28 |
# Group by 'desc_2' and count, then convert to DataFrame
night_desc.groupby("desc_2").size().reset_index(name='count').sort_values("count", ascending = False).head(10)| desc_2 | count | |
|---|---|---|
| 7 | bars | 93 |
| 103 | tradamerican | 61 |
| 93 | sportsbars | 48 |
| 25 | cocktailbars | 43 |
| 69 | newamerican | 39 |
| 62 | lounges | 28 |
| 86 | seafood | 28 |
| 68 | musicvenues | 25 |
| 80 | pubs | 24 |
| 11 | beerbar | 19 |
# Group by 'desc_3' and count, then convert to DataFrame
night_desc.groupby("desc_3").size().reset_index(name='count').sort_values("count", ascending = False).head(10)| desc_3 | count | |
|---|---|---|
| 17 | cocktailbars | 51 |
| 70 | tradamerican | 39 |
| 3 | bars | 24 |
| 8 | breakfast_brunch | 21 |
| 6 | beerbar | 20 |
| 72 | venues | 17 |
| 10 | burgers | 17 |
| 56 | sandwiches | 15 |
| 48 | musicvenues | 12 |
| 62 | sportsbars | 12 |
Replace desc_1 with more relevant alias terms
# replace desc_2
condition1 = night_desc['desc_2'].isin(["newamerican", "seafood", "breakfast_brunch", "tradamerican", "burgers", "sandwiches", "pizza", "mexican", "italian",
"french", "ramen", "tapasmallplates", "french", "ramen", "mideastern", "comfortfood", "steak", "mediterranean", "caribbean",
"bbq", "peruvian", "japanese", "ethiopian", "tex-mex", "southern", "juicebars", "cajun", "soulfood", "portugeuse", "sushi",
"vegan", "cafes", "diners", "chicken-wings", "chinese", "salad", "cheesesteaks", "african", "german", "desserts", "indpak",
"thai", "tacos", "falafel", "vietnamese", "waffles", "delis", "coffee", "korean", "hotdog", "moroccan", "newmexican", "cuban",
"argentine", "gluten_free", "russian", "bakeries", "portuguese", "spanish", "latin", "catering", "gastropubs",
"breweries", "gaybars", "whiskeybars", "tikibars", "tobaccoshops", "beer_and_wine", "breweries", "venues",
"poolhalls", "jazzandblues", "airportlounges", "cigarbars", "speakeasies", "theater", "nightlife"])
condition2 = night_desc['desc_3'].notna()
# Update desc_1 where conditions are met
night_desc.loc[condition1 & condition2, 'desc_2'] = night_desc['desc_3']
# replace desc_1
condition1 = night_desc['desc_1'].isin(["newamerican", "seafood", "breakfast_brunch", "tradamerican", "burgers", "sandwiches", "pizza", "mexican", "italian",
"french", "ramen", "tapasmallplates", "french", "ramen", "mideastern", "comfortfood", "steak", "mediterranean", "caribbean",
"bbq", "peruvian", "japanese", "ethiopian", "tex-mex", "southern", "juicebars", "cajun", "soulfood", "portugeuse", "sushi",
"vegan", "cafes", "diners", "chicken-wings", "chinese", "salad", "cheesesteaks", "african", "german", "desserts", "indpak",
"thai", "tacos", "falafel", "vietnamese", "waffles", "delis", "coffee", "korean", "hotdog", "moroccan", "newmexican", "cuban",
"argentine", "gluten_free", "russian", "bakeries", "portuguese", "spanish", "latin", "catering", "gastropubs",
"breweries", "gaybars", "whiskeybars", "tikibars", "tobaccoshops", "beer_and_wine", "breweries", "venues",
"poolhalls", "jazzandblues", "airportlounges", "cigarbars", "speakeasies", "theater", "nightlife"])
condition2 = night_desc['desc_2'].notna()
# Update desc_1 where conditions are met
night_desc.loc[condition1 & condition2, 'desc_1'] = night_desc['desc_2']
night_desc.groupby("desc_1").size().reset_index(name='count').sort_values("count", ascending = False).head(15)| desc_1 | count | |
|---|---|---|
| 4 | bars | 271 |
| 51 | sportsbars | 114 |
| 12 | cocktailbars | 105 |
| 45 | pubs | 86 |
| 39 | lounges | 64 |
| 20 | divebars | 60 |
| 6 | beerbar | 44 |
| 61 | wine_bars | 42 |
| 40 | musicvenues | 42 |
| 31 | hookah_bars | 36 |
| 17 | danceclubs | 32 |
| 34 | irish_pubs | 14 |
| 37 | karaoke | 13 |
| 7 | beergardens | 12 |
| 0 | airportlounges | 9 |
Filter for categories where frequency ≥ 8
night_desc_1 = night_desc.groupby("desc_1").size().reset_index(name='count').sort_values("count", ascending = False).head(9)
top_night_desc = list(night_desc_1["desc_1"])
# filter for obs in the list of top descriptions
night_desc = night_desc[night_desc["desc_1"].isin(top_night_desc)]
night_desc.head(10)| name | rating | address | alias | title | geometry | desc_1 | desc_2 | desc_3 | |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 720 Bistro | 4.0 | 720 S.52nd st | seafood, breakfast_brunch, cocktailbars | Seafood, Breakfast & Brunch, Cocktail Bars | POINT (2675620.054 234102.746) | cocktailbars | cocktailbars | cocktailbars |
| 2 | Bayou | 4.5 | 5025 Baltimore Ave | lounges | Lounges | POINT (2676525.799 234061.840) | lounges | NaN | NaN |
| 3 | Booker's Restaurant and Bar | 3.5 | 5021 Baltimore Ave | bars, breakfast_brunch, tradamerican | Bars, Breakfast & Brunch, American | POINT (2676559.222 234070.085) | bars | tradamerican | tradamerican |
| 4 | The Wine Garden | 4.5 | 5019 Baltimore Ave | wine_bars | Wine Bars | POINT (2676563.551 234060.148) | wine_bars | NaN | NaN |
| 5 | Carbon Copy | 4.5 | 701 S 50th St | breweries, pizza, bars | Breweries, Pizza, Bars | POINT (2676997.957 233927.765) | bars | bars | bars |
| 6 | Dock Street Cannery and Tasting Room | 4.0 | 705 S 50th St | beerbar, lounges | Beer Bar, Lounges | POINT (2677046.168 233912.738) | beerbar | lounges | NaN |
| 7 | Trendsetters Bar & Lounge | 3.5 | 5301 Woodland Ave | burgers, lounges, wraps | Burgers, Lounges, Wraps | POINT (2678439.957 230236.502) | lounges | lounges | wraps |
| 8 | The Barn | 3.5 | 4901 Catharine St | poolhalls, sportsbars, divebars | Pool Halls, Sports Bars, Dive Bars | POINT (2677365.738 234212.623) | sportsbars | sportsbars | divebars |
| 9 | Eris Temple | 4.0 | 602 S 52nd St | musicvenues | Music Venues | POINT (2675702.055 234852.608) | musicvenues | NaN | NaN |
| 10 | Dahlak | 3.5 | 4708 Baltimore Ave | ethiopian, divebars | Ethiopian, Dive Bars | POINT (2678487.076 234234.642) | divebars | divebars | NaN |
Entertainment
# Filter and create a copy to avoid SettingWithCopyWarning
entertain_desc = entertainment.copy()
# Create a new column 'desc_1'
entertain_desc["desc_1"] = entertain_desc['alias'].str.split(',').str[0].str.strip().str.lower()
entertain_desc["desc_2"] = entertain_desc['alias'].str.split(',').str[1].str.strip().str.lower()
entertain_desc["desc_3"] = entertain_desc['alias'].str.split(',').str[2].str.strip().str.lower()entertain_desc.head()| name | rating | address | alias | title | geometry | desc_1 | desc_2 | desc_3 | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | Noam Osband | 5.0 | NaN | musicians | Musicians | POINT (2676083.283 232370.886) | musicians | NaN | NaN |
| 1 | Ceramic Concept | 5.0 | 5015 Baltimore Ave | galleries | Art Galleries | POINT (2676620.711 234054.592) | galleries | NaN | NaN |
| 2 | Eris Temple | 4.0 | 602 S 52nd St | musicvenues | Music Venues | POINT (2675702.055 234852.608) | musicvenues | NaN | NaN |
| 3 | Baltimore Avenue Dollar Stroll | 5.0 | 4800 Baltimore Ave | unofficialyelpevents, festivals | Unofficial Yelp Events, Festivals | POINT (2677842.004 234135.971) | unofficialyelpevents | festivals | NaN |
| 4 | Curio Theatre Company | 4.5 | 815 S 48th St | theater | Performing Arts | POINT (2678192.432 234160.959) | theater | NaN | NaN |
Alias
# Group by 'desc_1' and count, then convert to DataFrame
entertain_desc.groupby("desc_1").size().reset_index(name='count').sort_values("count", ascending = False).head(10)| desc_1 | count | |
|---|---|---|
| 130 | theater | 127 |
| 53 | galleries | 96 |
| 50 | festivals | 69 |
| 86 | museums | 42 |
| 42 | djs | 28 |
| 89 | musicvenues | 28 |
| 105 | psychics | 23 |
| 85 | movietheaters | 20 |
| 117 | social_clubs | 19 |
| 47 | eventplanning | 16 |
entertain_desc.groupby("desc_2").size().reset_index(name='count').sort_values("count", ascending = False).head(10)| desc_2 | count | |
|---|---|---|
| 45 | galleries | 32 |
| 80 | musicvenues | 25 |
| 116 | theater | 24 |
| 119 | venues | 19 |
| 35 | dancestudio | 14 |
| 39 | eventplanning | 11 |
| 34 | danceclubs | 10 |
| 75 | movietheaters | 10 |
| 4 | arcades | 9 |
| 14 | bars | 8 |
entertain_desc.groupby("desc_3").size().reset_index(name='count').sort_values("count", ascending = False).head(10)| desc_3 | count | |
|---|---|---|
| 59 | musicvenues | 13 |
| 92 | venues | 9 |
| 25 | dance_schools | 6 |
| 27 | dancestudio | 5 |
| 58 | musicians | 5 |
| 34 | eventplanning | 5 |
| 49 | kids_activities | 5 |
| 2 | arcades | 4 |
| 3 | artclasses | 4 |
| 89 | theater | 4 |
Replace desc_1 with more relevant alias terms
# replace desc_2
condition1 = entertain_desc['desc_2'].isin(["arts", "musicians", "kids_activities", "venues", "bars", "nonprofit", "lounges"])
condition2 = entertain_desc['desc_3'].notna()
# Update desc_1 where conditions are met
entertain_desc.loc[condition1 & condition2, 'desc_2'] = entertain_desc['desc_3']
# replace desc_1
condition1 = entertain_desc['desc_1'].isin(["arts", "musicians", "kids_activities", "venues", "bars", "nonprofit", "lounges"])
condition2 = entertain_desc['desc_2'].notna()
# Update desc_1 where conditions are met
entertain_desc.loc[condition1 & condition2, 'desc_1'] = entertain_desc['desc_2']
entertain_desc.groupby("desc_1").size().reset_index(name='count').sort_values("count", ascending = False).head(40)| desc_1 | count | |
|---|---|---|
| 131 | theater | 130 |
| 55 | galleries | 98 |
| 52 | festivals | 70 |
| 87 | museums | 42 |
| 90 | musicvenues | 34 |
| 43 | djs | 29 |
| 118 | social_clubs | 24 |
| 106 | psychics | 23 |
| 86 | movietheaters | 21 |
| 48 | eventplanning | 17 |
| 39 | danceclubs | 16 |
| 35 | countryclubs | 11 |
| 57 | gardens | 10 |
| 7 | artmuseums | 10 |
| 104 | psychic_astrology | 9 |
| 40 | dancestudio | 9 |
| 95 | paintandsip | 9 |
| 125 | stadiumsarenas | 9 |
| 8 | arts | 8 |
| 17 | bowling | 8 |
| 5 | arcades | 8 |
| 100 | photoboothrentals | 8 |
| 124 | sportsteams | 7 |
| 6 | artclasses | 7 |
| 37 | culturalcenter | 7 |
| 32 | comedyclubs | 6 |
| 105 | psychicmediums | 6 |
| 91 | newamerican | 6 |
| 82 | magicians | 5 |
| 71 | jazzandblues | 5 |
| 27 | catering | 5 |
| 26 | casinos | 5 |
| 15 | bingo | 5 |
| 54 | framing | 4 |
| 122 | specialtyschools | 4 |
| 123 | sportsbars | 4 |
| 0 | 4 | |
| 98 | partyequipmentrentals | 4 |
| 28 | clowns | 4 |
| 9 | artsandcrafts | 3 |
Filter for frequency
entertain_desc_1 = entertain_desc.groupby("desc_1").size().reset_index(name='count').sort_values("count", ascending = False)
entertain_desc_1 = entertain_desc_1[entertain_desc_1['count'] >= 5]
top_entertain_desc = list(entertain_desc_1["desc_1"])
top_entertain_desc.remove('psychics')
# filter for obs in the list of top descriptions
entertain_desc = entertain_desc[entertain_desc["desc_1"].isin(top_entertain_desc)]entertain_desc.groupby("desc_1").size().reset_index(name='count').sort_values("count", ascending = False)
entertain_desc.head(5)| desc_1 | count | |
|---|---|---|
| 31 | theater | 130 |
| 16 | galleries | 98 |
| 15 | festivals | 70 |
| 21 | museums | 42 |
| 22 | musicvenues | 34 |
| 13 | djs | 29 |
| 28 | social_clubs | 24 |
| 20 | movietheaters | 21 |
| 14 | eventplanning | 17 |
| 11 | danceclubs | 16 |
| 9 | countryclubs | 11 |
| 2 | artmuseums | 10 |
| 17 | gardens | 10 |
| 12 | dancestudio | 9 |
| 24 | paintandsip | 9 |
| 30 | stadiumsarenas | 9 |
| 26 | psychic_astrology | 9 |
| 25 | photoboothrentals | 8 |
| 5 | bowling | 8 |
| 3 | arts | 8 |
| 0 | arcades | 8 |
| 10 | culturalcenter | 7 |
| 1 | artclasses | 7 |
| 29 | sportsteams | 7 |
| 23 | newamerican | 6 |
| 27 | psychicmediums | 6 |
| 8 | comedyclubs | 6 |
| 19 | magicians | 5 |
| 7 | catering | 5 |
| 6 | casinos | 5 |
| 18 | jazzandblues | 5 |
| 4 | bingo | 5 |
# Define the mapping dictionary
desc_mapping = {
"djs": "nightclubs",
"danceclubs": "nightclubs",
"artmuseums": "museums",
"sportsteams": "sports",
"stadiumsarenas": "sports",
"paintandsip": "specialtyclasses",
"dancestudio": "specialtyclasses",
"bowling": "games/arcade",
"bingo": "games/arcade",
"arcades": "games/arcade",
"artclasses": "specialtyclasses"
}
# Replace desc_1 values using the mapping dictionary
entertain_desc.loc[:, 'desc_1'] = entertain_desc['desc_1'].replace(desc_mapping)entertain_cat = entertain_desc.groupby("desc_1").size().reset_index(name='count').sort_values("count", ascending = False)
entertain_cat_top = list(entertain_cat[entertain_cat["count"] >= 11]["desc_1"])
# filter for desc_1
entertain_desc = entertain_desc[entertain_desc["desc_1"].isin(entertain_cat_top)]
entertain_desc.groupby("desc_1").size().reset_index(name='count').sort_values("count", ascending = False)
entertain_desc.head(5)| desc_1 | count | |
|---|---|---|
| 12 | theater | 130 |
| 3 | galleries | 98 |
| 2 | festivals | 70 |
| 6 | museums | 52 |
| 8 | nightclubs | 45 |
| 7 | musicvenues | 34 |
| 10 | specialtyclasses | 25 |
| 9 | social_clubs | 24 |
| 4 | games/arcade | 21 |
| 5 | movietheaters | 21 |
| 1 | eventplanning | 17 |
| 11 | sports | 16 |
| 0 | countryclubs | 11 |
Shopping
# Filter and create a copy to avoid SettingWithCopyWarning
shop_desc = shopping.copy()
# Create a new column 'desc_1'
shop_desc["desc_1"] = shop_desc['alias'].str.split(',').str[0].str.strip().str.lower()
shop_desc["desc_2"] = shop_desc['alias'].str.split(',').str[1].str.strip().str.lower()
shop_desc["desc_3"] = shop_desc['alias'].str.split(',').str[2].str.strip().str.lower()shop_desc.head()| name | rating | address | alias | title | geometry | desc_1 | desc_2 | desc_3 | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | Rite Aid | 2.0 | 5214-30 Baltimore Ave | drugstores, convenience | Drugstores, Convenience Stores | POINT (2675264.630 233732.096) | drugstores | convenience | NaN |
| 1 | Ajah Creative Sweets And Treats & More | 5.0 | NaN | desserts, cupcakes, giftshops | Desserts, Cupcakes, Gift Shops | POINT (2677939.712 232732.831) | desserts | cupcakes | giftshops |
| 2 | Ceramic Concept | 5.0 | 5015 Baltimore Ave | galleries | Art Galleries | POINT (2676620.711 234054.592) | galleries | NaN | NaN |
| 3 | VIX Emporium | 4.5 | 5009 Baltimore Ave | artsandcrafts, childcloth, jewelry | Arts & Crafts, Children's Clothing, Jewelry | POINT (2676676.410 234091.659) | artsandcrafts | childcloth | jewelry |
| 4 | Michele Judge | 5.0 | 4919 Pentridge St | jewelryrepair, artclasses, jewelry | Jewelry Repair, Art Classes, Jewelry | POINT (2677304.635 233898.273) | jewelryrepair | artclasses | jewelry |
Alias
# Group by 'desc_1' and count, then convert to DataFrame
shop_desc.groupby("desc_1").size().reset_index(name='count').sort_values("count", ascending = False).head(10)| desc_1 | count | |
|---|---|---|
| 32 | drugstores | 234 |
| 142 | womenscloth | 114 |
| 69 | jewelry | 90 |
| 31 | discountstore | 87 |
| 49 | galleries | 84 |
| 48 | furniture | 74 |
| 113 | shoes | 68 |
| 29 | deptstores | 56 |
| 86 | menscloth | 50 |
| 11 | bookstores | 41 |
# Group by 'desc_2' and count, then convert to DataFrame
shop_desc.groupby("desc_2").size().reset_index(name='count').sort_values("count", ascending = False).head(10)| desc_2 | count | |
|---|---|---|
| 24 | convenience | 95 |
| 0 | accessories | 67 |
| 139 | womenscloth | 66 |
| 81 | menscloth | 51 |
| 48 | furniture | 38 |
| 65 | jewelry | 35 |
| 100 | pharmacy | 34 |
| 35 | discountstore | 32 |
| 49 | galleries | 29 |
| 61 | homedecor | 27 |
# Group by desc_3' and count, then convert to DataFrame
shop_desc.groupby("desc_3").size().reset_index(name='count').sort_values("count", ascending = False).head(10)| desc_3 | count | |
|---|---|---|
| 21 | convenience | 55 |
| 0 | accessories | 30 |
| 115 | womenscloth | 28 |
| 54 | homedecor | 26 |
| 79 | pharmacy | 23 |
| 95 | sportswear | 18 |
| 92 | shoes | 18 |
| 56 | jewelry | 18 |
| 15 | childcloth | 16 |
| 108 | vintage | 15 |
Filter for most common labels
shop_desc_1 = shop_desc.groupby("desc_1").size().reset_index(name='count').sort_values("count", ascending = False).head(20)
top_shop_desc = list(shop_desc_1["desc_1"])
top_shop_desc['drugstores',
'womenscloth',
'jewelry',
'discountstore',
'galleries',
'furniture',
'shoes',
'deptstores',
'menscloth',
'bookstores',
'accessories',
'thrift_stores',
'antiques',
'grocery',
'giftshops',
'homedecor',
'selfstorage',
'childcloth',
'sportswear',
'vintage']
# remove drugstores and grocery
top_shop_desc.remove('drugstores')
top_shop_desc.remove('grocery')
# filter for obs in the list of top descriptions
shop_desc = shop_desc[shop_desc["desc_1"].isin(top_shop_desc)]# Define the mapping dictionary
desc_mapping = {
"mattresses": "furniture",
"kitchenandbath": "furniture",
"rugs": "homedecor",
"hats": "accessories",
"watch_repair": "accessories"
}
# Replace desc_1 values using the mapping dictionary
shop_desc.loc[:, 'desc_1'] = shop_desc['desc_1'].replace(desc_mapping)
shop_desc.groupby("desc_1").size().reset_index(name='count').sort_values("count", ascending = False).head(20)| desc_1 | count | |
|---|---|---|
| 17 | womenscloth | 114 |
| 10 | jewelry | 90 |
| 5 | discountstore | 87 |
| 7 | galleries | 84 |
| 6 | furniture | 74 |
| 13 | shoes | 68 |
| 4 | deptstores | 56 |
| 11 | menscloth | 50 |
| 2 | bookstores | 41 |
| 0 | accessories | 40 |
| 15 | thrift_stores | 36 |
| 1 | antiques | 31 |
| 8 | giftshops | 24 |
| 12 | selfstorage | 23 |
| 9 | homedecor | 23 |
| 3 | childcloth | 21 |
| 14 | sportswear | 20 |
| 16 | vintage | 19 |
Healthcare
# Filter and create a copy to avoid SettingWithCopyWarning
health_desc = healthcare.copy()
# Create a new column 'desc_1'
health_desc["desc_1"] = health_desc['alias'].str.split(',').str[0].str.strip().str.lower()
health_desc["desc_2"] = health_desc['alias'].str.split(',').str[1].str.strip().str.lower()
health_desc["desc_3"] = health_desc['alias'].str.split(',').str[2].str.strip().str.lower()Alias
# Group by 'desc_1' and count, then convert to DataFrame
health_desc.groupby("desc_1").size().reset_index(name='count').sort_values("count", ascending = False).head(10)| desc_1 | count | |
|---|---|---|
| 38 | laboratorytesting | 33 |
| 80 | urgent_care | 26 |
| 33 | homehealthcare | 25 |
| 43 | medcenters | 23 |
| 29 | generaldentistry | 21 |
| 35 | hospitals | 20 |
| 37 | internalmed | 19 |
| 21 | drugstores | 19 |
| 26 | familydr | 16 |
| 11 | chiropractors | 14 |
Filter for most common labels
health_desc_1 = health_desc.groupby("desc_1").size().reset_index(name='count').sort_values("count", ascending = False).head(20)
top_health_desc = list(shop_desc_1["desc_1"])
top_health_desc['drugstores',
'womenscloth',
'jewelry',
'discountstore',
'galleries',
'furniture',
'shoes',
'deptstores',
'menscloth',
'bookstores',
'accessories',
'thrift_stores',
'antiques',
'grocery',
'giftshops',
'homedecor',
'selfstorage',
'childcloth',
'sportswear',
'vintage']
# filter for obs in the list of top descriptions
health_desc = health_desc[health_desc["desc_1"].isin(top_health_desc)]Historic landmarks
# Filter and create a copy to avoid SettingWithCopyWarning
historic_desc = historic.copy()
# Create a new column 'desc_1'
historic_desc["desc_1"] = historic_desc['alias'].str.split(',').str[0].str.strip().str.lower()
historic_desc["desc_2"] = historic_desc['alias'].str.split(',').str[1].str.strip().str.lower()
historic_desc["desc_3"] = historic_desc['alias'].str.split(',').str[2].str.strip().str.lower()Alias
# Group by 'desc_1' and count, then convert to DataFrame
historic_desc.groupby("desc_1").size().reset_index(name='count').sort_values("count", ascending = False).head(10)| desc_1 | count | |
|---|---|---|
| 1 | landmarks | 46 |
| 4 | museums | 25 |
| 0 | artmuseums | 2 |
| 2 | libraries | 2 |
| 9 | venues | 2 |
| 3 | municipality | 1 |
| 5 | nonprofit | 1 |
| 6 | postoffices | 1 |
| 7 | publicart | 1 |
| 8 | religiousorgs | 1 |
Filter for most common labels
historic_desc_1 = historic_desc.groupby("desc_1").size().reset_index(name='count').sort_values("count", ascending = False).head(3)
top_historic_desc = list(historic_desc_1["desc_1"])
top_historic_desc['landmarks', 'museums', 'artmuseums']
# filter for obs in the list of top descriptions
historic_desc = historic_desc[historic_desc["desc_1"].isin(top_historic_desc)]# Define the mapping dictionary
desc_mapping = {
"artmuseums": "museums"
}
# Replace desc_1 values using the mapping dictionary
historic_desc['desc_1'] = historic_desc['desc_1'].replace(desc_mapping)
historic_desc.groupby("desc_1").size().reset_index(name='count').sort_values("count", ascending = False).head(20)| desc_1 | count | |
|---|---|---|
| 0 | landmarks | 46 |
| 1 | museums | 27 |
Kids activities
kids_desc = kids.copy()
# Create a new column 'desc_1'
kids_desc["desc_1"] = kids_desc['alias'].str.split(',').str[0].str.strip().str.lower()
kids_desc["desc_2"] = kids_desc['alias'].str.split(',').str[1].str.strip().str.lower()
kids_desc["desc_3"] = kids_desc['alias'].str.split(',').str[2].str.strip().str.lower()
kids_desc.head()| name | rating | address | alias | title | geometry | desc_1 | desc_2 | desc_3 | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | Makono African Hair Braiding | 3.0 | 5529 Baltimore Ave | hairstylists, cosmetics | Hair Stylists, Cosmetics & Beauty Supply | POINT (2673677.520 233714.490) | hairstylists | cosmetics | NaN |
| 1 | Carbon Copy | 4.5 | 701 S 50th St | breweries, pizza, bars | Breweries, Pizza, Bars | POINT (2676997.957 233927.765) | breweries | pizza | bars |
| 2 | Firehouse Bicycles | 4.5 | 701 S 50th St | bikes, bike_repair_maintenance | Bikes, Bike Repair/Maintenance | POINT (2677005.367 233951.502) | bikes | bike_repair_maintenance | NaN |
| 3 | Common Beat Music | 4.0 | 4916 Baltimore Ave | vinyl_records, electronicsrepair, musicinstrumentservices | Vinyl Records, Electronics Repair, Musical Instrument Services | POINT (2677188.738 233998.088) | vinyl_records | electronicsrepair | musicinstrumentservices |
| 4 | Cedar Park | 4.5 | 50th St & Baltimore Ave | parks | Parks | POINT (2677006.643 234133.514) | parks | NaN | NaN |
Alias
# Group by 'desc_1' and count, then convert to DataFrame
kids_desc.groupby("desc_1").size().reset_index(name='count').sort_values("count", ascending = False).head(10)| desc_1 | count | |
|---|---|---|
| 45 | childcare | 92 |
| 232 | parks | 53 |
| 131 | generaldentistry | 39 |
| 87 | drugstores | 36 |
| 141 | gyms | 34 |
| 255 | preschools | 29 |
| 311 | theater | 26 |
| 95 | elementaryschools | 26 |
| 189 | libraries | 26 |
| 107 | festivals | 25 |
kids_desc = kids_desc[kids_desc["desc_1"].isin(["childcare", "parks", "preschools", "theater", "libraries", "festivals", "museums",
"martialarts", "churches", "playgrounds", "musicalinstrumentsandteachers", "landmarks",
"kids_activities"])]kids_desc.groupby("desc_1").size().reset_index(name='count').sort_values("count", ascending = False)| desc_1 | count | |
|---|---|---|
| 0 | childcare | 92 |
| 9 | parks | 53 |
| 11 | preschools | 29 |
| 5 | libraries | 26 |
| 12 | theater | 26 |
| 2 | festivals | 25 |
| 7 | museums | 19 |
| 1 | churches | 18 |
| 6 | martialarts | 18 |
| 10 | playgrounds | 18 |
| 8 | musicalinstrumentsandteachers | 17 |
| 4 | landmarks | 15 |
| 3 | kids_activities | 13 |
Art and music
arts_desc = arts.copy()
# Create a new column 'desc_1'
arts_desc["desc_1"] = arts_desc['alias'].str.split(',').str[0].str.strip().str.lower()
arts_desc["desc_2"] = arts_desc['alias'].str.split(',').str[1].str.strip().str.lower()
arts_desc["desc_3"] = arts_desc['alias'].str.split(',').str[2].str.strip().str.lower()
arts_desc.head()| name | rating | address | alias | title | geometry | desc_1 | desc_2 | desc_3 | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | Curio Theatre Company | 4.5 | 815 S 48th St | theater | Performing Arts | POINT (2678192.432 234160.959) | theater | NaN | NaN |
| 1 | Studio 34 | 4.5 | 4522 Baltimore Ave | yoga, pilates, theater | Yoga, Pilates, Performing Arts | POINT (2679553.196 234477.061) | yoga | pilates | theater |
| 2 | Painted Bride Art Center | 3.5 | 5212 Market St | theater, venues | Performing Arts, Venues & Event Spaces | POINT (2676029.704 238278.040) | theater | venues | NaN |
| 3 | The Rotunda | 4.5 | 4014 Walnut St | musicvenues, theater | Music Venues, Performing Arts | POINT (2682266.656 236480.866) | musicvenues | theater | NaN |
| 4 | Penn Live Arts | 4.0 | 3680 Walnut St | theater, jazzandblues, musicvenues | Performing Arts, Jazz & Blues, Music Venues | POINT (2684326.329 236037.832) | theater | jazzandblues | musicvenues |
Alias
# Group by 'desc_1' and count, then convert to DataFrame
arts_desc.groupby("desc_1").size().reset_index(name='count').sort_values("count", ascending = False).head(10)| desc_1 | count | |
|---|---|---|
| 27 | theater | 163 |
| 7 | dancestudio | 11 |
| 18 | musicalinstrumentsandteachers | 11 |
| 12 | galleries | 9 |
| 20 | musicvenues | 6 |
| 19 | musicians | 5 |
| 11 | festivals | 5 |
| 6 | dance_schools | 5 |
| 3 | artschools | 4 |
| 1 | artmuseums | 4 |
Regroup into large categories
# Define the mapping dictionary
desc_mapping = {
"specialtyschools": "classes",
"dance_schools": "classes",
"artschools": "classes",
"musicalinstrumentsandteachers": "classes",
"dancestudio": "dancestudios",
"galleries": "gallery/museum",
"museums": "gallery/museum",
"artmuseums": "gallery/museum",
"festivals": "events/shows",
"jazzandblues": "events/shows",
"djs": "events/shows",
"musicvenues": "events/shows",
"comedyclubs": "events/shows",
"magicians": "events/shows",
"musicians": "events/shows"
}
# Replace desc_1 values using the mapping dictionary
arts_desc['desc_1'] =arts_desc['desc_1'].replace(desc_mapping)
arts_desc.groupby("desc_1").size().reset_index(name='count').sort_values("count", ascending = False).head(20)| desc_1 | count | |
|---|---|---|
| 16 | theater | 163 |
| 7 | events/shows | 24 |
| 2 | classes | 22 |
| 8 | gallery/museum | 15 |
| 4 | dancestudios | 11 |
| 3 | culturalcenter | 2 |
| 0 | aerialfitness | 1 |
| 12 | opera | 1 |
| 17 | venues | 1 |
| 15 | summer_camps | 1 |
| 14 | social_clubs | 1 |
| 13 | preschools | 1 |
| 9 | hair | 1 |
| 11 | nonprofit | 1 |
| 10 | language_schools | 1 |
| 1 | arts | 1 |
| 6 | educationservices | 1 |
| 5 | dinnertheater | 1 |
| 18 | yoga | 1 |
Filter for most common labels
arts_desc_1 = arts_desc.groupby("desc_1").size().reset_index(name='count').sort_values("count", ascending = False).head(20)
top_arts_desc = list(arts_desc_1["desc_1"])
# remove historical landmarks
top_arts_desc.remove('summer_camps')
top_arts_desc.remove('preschools')
top_arts_desc.remove('language_schools')
top_arts_desc.remove('educationservices')
top_arts_desc['theater',
'events/shows',
'classes',
'gallery/museum',
'dancestudios',
'culturalcenter',
'aerialfitness',
'opera',
'venues',
'social_clubs',
'hair',
'nonprofit',
'arts',
'dinnertheater',
'yoga']
arts_desc = arts_desc[arts_desc["desc_1"].isin(top_arts_desc)]Beauty
beauty_desc = beauty.copy()
# Create a new column 'desc_1'
beauty_desc["desc_1"] = beauty_desc['alias'].str.split(',').str[0].str.strip().str.lower()
beauty_desc["desc_2"] = beauty_desc['alias'].str.split(',').str[1].str.strip().str.lower()
beauty_desc["desc_3"] = beauty_desc['alias'].str.split(',').str[2].str.strip().str.lower()
beauty_desc.tail()| name | rating | address | alias | title | geometry | desc_1 | desc_2 | desc_3 | |
|---|---|---|---|---|---|---|---|---|---|
| 2156 | Lips and Drips by Erica Marie | 5.0 | 2342 S Broad St | medicalspa, ivhydration | Medical Spas, IV Hydration | POINT (2691839.535 224599.252) | medicalspa | ivhydration | NaN |
| 2157 | The Lash Lounge Wyndmoor - 909 Willow | 4.0 | 909 E Willow Grove Ave | eyelashservice, eyebrowservices, threadingservices | Eyelash Service, Eyebrow Services, Threading Services | POINT (2685044.968 283383.097) | eyelashservice | eyebrowservices | threadingservices |
| 2158 | Hair Expo | 3.5 | 7721 Crittenden St | hairstylists | Hair Stylists | POINT (2683447.588 279731.355) | hairstylists | NaN | NaN |
| 2159 | The Cut Bar | 4.0 | 532 S 4th St | barbers | Barbers | POINT (2697625.976 232400.348) | barbers | NaN | NaN |
| 2160 | House of Hair Lounge | 3.0 | 259 S 60th St | hairstylists | Hair Stylists | POINT (2671622.181 237231.748) | hairstylists | NaN | NaN |
# Group by 'desc_1' and count, then convert to DataFrame
beauty_desc.groupby("desc_1").size().reset_index(name='count').sort_values("count", ascending = False).head(10)| desc_1 | count | |
|---|---|---|
| 85 | hair | 378 |
| 146 | othersalons | 342 |
| 12 | barbers | 149 |
| 89 | hairstylists | 87 |
| 43 | cosmetics | 82 |
| 119 | massage | 74 |
| 70 | florists | 53 |
| 117 | makeupartists | 49 |
| 86 | hair_extensions | 35 |
| 188 | tattoo | 33 |
remove_cat = ["florists", "jewelry", "drugstores", "generaldentistry", "discountstore", "womenscloth",
"deptstores", "furniture", "", "bookstores", "galleries", "giftshops", "artsandcrafts",
"gardening", "fabricstores", "antiques", "menscloth", "orthodontists", "vintage",
"thriftstores", "stationary", "bikes", "beautysvc", "accessories", "optometrists",
"framing", "interiordesign", "tobaccoshops", "homedecor", "stationary", "thriftstores"]
# Grouping, counting, and filtering beauty_desc
beauty_desc_1 = beauty_desc.groupby("desc_1").size().reset_index(name='count').sort_values("count", ascending=False)
top_beauty_desc = beauty_desc_1[beauty_desc_1['count'] > 6]
# Filtering top_beauty_desc to remove unwanted categories
top_beauty_desc = top_beauty_desc[~top_beauty_desc["desc_1"].isin(remove_cat)]["desc_1"]
beauty_desc = beauty_desc[beauty_desc["desc_1"].isin(top_beauty_desc)]Combine all amenities
# Create explicit copies of the DataFrame slices
food_desc = food_desc.copy()
parks_desc = parks_desc.copy()
edu_desc = edu_desc.copy()
grocery_desc = grocery_desc.copy()
entertain_desc = entertain_desc.copy()
night_desc = night_desc.copy()
shop_desc = shop_desc.copy()
health_desc = health_desc.copy()
historic_desc = historic_desc.copy()
arts_desc = arts_desc.copy()
beauty_desc = beauty_desc.copy()
kids_desc = kids_desc.copy()
# Now perform the operations
food_desc.loc[:, 'type'] = 'restaurant'
parks_desc.loc[:, 'type'] = 'parks'
edu_desc.loc[:, 'type'] = 'education'
grocery_desc.loc[:, 'type'] = 'grocery'
entertain_desc.loc[:, 'type'] = 'entertainment'
night_desc.loc[:, 'type'] = 'nightlife'
shop_desc.loc[:, 'type'] = 'shopping'
health_desc.loc[:, 'type'] = 'healthcare'
historic_desc.loc[:, 'type'] = 'historic'
arts_desc.loc[:, 'type'] = 'arts'
beauty_desc.loc[:, 'type'] = 'beauty'
kids_desc.loc[:, 'type'] = 'kids'# combine gdfs
amenities = gpd.GeoDataFrame(pd.concat([food_desc, parks_desc, edu_desc, grocery_desc, entertain_desc, night_desc, shop_desc,
health_desc, historic_desc, arts_desc, beauty_desc, kids_desc], ignore_index=True))Clip to Philly bounds
phl_bound_squeezed = phl_bound_proj.squeeze().geometry
amenities_phl = amenities.within(phl_bound_squeezed)
amenities_phl_gdf = amenities[amenities_phl]
amenities_phl_gdf.explore(tiles='CartoDB dark_matter', legend=True, column='type', cmap='Paired')# count by type
amenities_grouped = amenities_phl_gdf.groupby('type').size().reset_index(name='count')Amenities spatial patterns
# Extract unique business types
business_types = amenities_phl_gdf['type'].unique()
# Determine the number of rows and columns for the subplots
n_rows = len(business_types) // 3 + (len(business_types) % 3 > 0)
fig, axes = plt.subplots(n_rows, 3, figsize=(15, n_rows * 4))
# Flatten the axes array for easy looping
axes = axes.flatten()
# Create a map for each business type
for i, business_type in enumerate(business_types):
# Filter data for the current business type
subset = amenities_phl_gdf[amenities_phl_gdf['type'] == business_type]
# Get count for the current business type
count = amenities_grouped[amenities_grouped['type'] == business_type]['count'].values[0]
# Plotting with transparency
subset.plot(ax=axes[i], color='navy', markersize=1, alpha=1)
# Set title with count (n = count)
axes[i].set_title(f"{business_type.capitalize()} (n = {count})")
# Customizations: Remove boxes, axis ticks, and labels
axes[i].set_axis_off()
# Remove unused subplots
for j in range(i+1, len(axes)):
fig.delaxes(axes[j])
# Adjust layout
plt.tight_layout()
# Display the panel of maps
plt.show()
Intersect with neighborhoods
amenities_neighborhood = gpd.sjoin(amenities_phl_gdf, tract, how = "left", predicate = "intersects")amenities_neighborhood.head()| name | rating | address | alias | title | geometry | desc_1 | desc_2 | desc_3 | type | index_right | nb_name | area_mi2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Rodflash Smoked BBQ | 0.0 | popuprestaurants | Pop-Up Restaurants | POINT (2676022.529 232162.402) | other | NaN | NaN | restaurant | 69 | Paschall-Kingsessing 08 | 0.160526 | |
| 1 | Gunpowder Cafe | 3.0 | NaN | fooddeliveryservices, indpak, cafes | Food Delivery Services, Indian, Cafes | POINT (2676022.529 232162.402) | indpak | indpak | cafes | restaurant | 69 | Paschall-Kingsessing 08 | 0.160526 |
| 2 | Lucky Chinese Restaurant | 0.0 | 5541 Chester Ave | chinese | Chinese | POINT (2676286.873 230415.145) | other | NaN | NaN | restaurant | 68 | Paschall-Kingsessing 07 | 0.097782 |
| 3 | J & J Restaurant | 5.0 | 5513 Chester Ave | caribbean | Caribbean | POINT (2676518.230 230597.923) | other | NaN | NaN | restaurant | 68 | Paschall-Kingsessing 07 | 0.097782 |
| 4 | Thunka Kitchen | 4.0 | 5629 Chester Ave | african | African | POINT (2675970.967 230117.411) | other | NaN | NaN | restaurant | 68 | Paschall-Kingsessing 07 | 0.097782 |
amenities_neigh_group = amenities_neighborhood.groupby(["nb_name", "type"]).size().reset_index(name='count')amenities_neigh_group.head()| nb_name | type | count | |
|---|---|---|---|
| 0 | Airport-Navy Yard 01 | entertainment | 2 |
| 1 | Airport-Navy Yard 01 | kids | 2 |
| 2 | Airport-Navy Yard 01 | nightlife | 1 |
| 3 | Airport-Navy Yard 01 | restaurant | 4 |
| 4 | Airport-Navy Yard 02 | beauty | 1 |
amenities_neigh_total = amenities_neigh_group.groupby("nb_name")['count'].sum().reset_index(name='total_amenities')amenities_neigh_total.head()| nb_name | total_amenities | |
|---|---|---|
| 0 | Airport-Navy Yard 01 | 9 |
| 1 | Airport-Navy Yard 02 | 69 |
| 2 | Airport-Navy Yard 03 | 53 |
| 3 | Airport-Navy Yard 04 | 46 |
| 4 | Airport-Navy Yard 05 | 57 |
amenities_neigh_group = amenities_neigh_group.merge(amenities_neigh_total, on="nb_name", how="left")amenities_neigh_group.head()| nb_name | type | count | total_amenities | |
|---|---|---|---|---|
| 0 | Airport-Navy Yard 01 | entertainment | 2 | 9 |
| 1 | Airport-Navy Yard 01 | kids | 2 | 9 |
| 2 | Airport-Navy Yard 01 | nightlife | 1 | 9 |
| 3 | Airport-Navy Yard 01 | restaurant | 4 | 9 |
| 4 | Airport-Navy Yard 02 | beauty | 1 | 69 |
amenities_neigh_group["pct_share"] = 100*amenities_neigh_group["count"]/amenities_neigh_group["total_amenities"]amenities_neigh_group.head()| nb_name | type | count | total_amenities | pct_share | |
|---|---|---|---|---|---|
| 0 | Airport-Navy Yard 01 | entertainment | 2 | 9 | 22.222222 |
| 1 | Airport-Navy Yard 01 | kids | 2 | 9 | 22.222222 |
| 2 | Airport-Navy Yard 01 | nightlife | 1 | 9 | 11.111111 |
| 3 | Airport-Navy Yard 01 | restaurant | 4 | 9 | 44.444444 |
| 4 | Airport-Navy Yard 02 | beauty | 1 | 69 | 1.449275 |
amenities_neigh_group_gdf = amenities_neigh_group.merge(tract, on = "nb_name", how = "left")amenities_neigh = gpd.GeoDataFrame(amenities_neigh_group_gdf, geometry=‘geometry’)
amenities_nb = amenities_neigh[["nb_name", "type", "count", "total_amenities", "pct_share", "geometry", "area_mi2"]]amenities_nb = amenities_nb.copy()
amenities_nb.loc[:, 'count_per_mi2'] = amenities_nb['count'] / amenities_nb['area_mi2']amenities_nb[amenities_nb["type"] == "kids"].explore(tiles='CartoDB dark_matter', legend=True, column = "pct_share", cmap = "magma")# Extract unique types
amenity_types = amenities_nb['type'].unique()
# Determine the number of rows and columns for the subplots
n_rows = len(amenity_types) // 3 + (len(amenity_types) % 3 > 0)
fig, axes = plt.subplots(n_rows, 3, figsize=(15, n_rows * 4))
# Flatten the axes array for easy looping
axes = axes.flatten()
# Create a choropleth map for each amenity type
for i, amenity_type in enumerate(amenity_types):
# Filter data for the current amenity type
subset = amenities_nb[amenities_nb['type'] == amenity_type]
# Plotting
subset.plot(column='pct_share', ax=axes[i], legend=True,
legend_kwds={'label': "share of all amenities"},
cmap='YlGnBu')
# Set title
axes[i].set_title(amenity_type.capitalize())
# Remove boxes, axis ticks, and axis labels
axes[i].set_axis_off()
# Remove unused subplots
for j in range(i+1, len(axes)):
fig.delaxes(axes[j])
# Adjust layout
plt.tight_layout()
# Display the panel of maps
plt.show()
Correlation analysis
# spreading the data
amenities_nb_wide = amenities_nb.pivot_table(index='nb_name', columns='type', values='pct_share', aggfunc=np.mean).fillna(0)# Calculating the correlation matrix
correlation_matrix = amenities_nb_wide.corr()
import altair as alt
# Reset index to convert the index into a column for Altair
heatmap_data = correlation_matrix.reset_index().melt('type', var_name='type2', value_name='correlation')
# Create the heatmap
heatmap = alt.Chart(heatmap_data).mark_rect().encode(
x='type:N',
y='type2:N',
color='correlation:Q'
)
# Add text to each cell
text = heatmap.mark_text(baseline='middle').encode(
text=alt.Text('correlation:Q', format='.2f'),
color=alt.condition(
alt.datum.correlation > 0.5,
alt.value('white'),
alt.value('black')
)
)
# Display the chart
chart = (heatmap + text).properties(width=600, height=600)
chartCluster analysis
from sklearn.preprocessing import MinMaxScaler, RobustScaler
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()Overall (all features)
kmeans = KMeans(n_clusters=5, random_state=42, n_init=10)amenities_nb_wide = amenities_nb.pivot_table(index='nb_name', columns='type', values='pct_share', fill_value=0).reset_index()amenities_nb_scaled = scaler.fit_transform(amenities_nb_wide[["arts", "education", "kids", "nightlife", "parks", "restaurant", "shopping"]])amenities_nb_scaled[:5]array([[-0.38932863, -0.60659512, 1.59378172, 0.24376265, -0.3493857 ,
0.36173374, -0.69059262],
[-0.38932863, -0.60659512, -0.46384369, 0.28301402, -0.3493857 ,
1.55557841, 0.01792224],
[ 0.00852099, -0.60659512, -0.28913964, 1.18061134, -0.3493857 ,
0.53185073, -0.69059262],
[-0.38932863, -0.44980285, 0.14002464, -0.30575651, -0.01282637,
-0.39555578, 2.14346683],
[-0.01939828, -0.22699277, -0.30139958, -0.08884105, -0.3493857 ,
0.85784044, 0.59592121]])
amenities_nb_scaled.mean(axis=0)array([ 3.59101787e-17, -4.30922144e-17, -1.43640715e-17, 8.61844289e-17,
2.39401191e-17, -7.06233514e-17, 8.25934110e-17])
amenities_nb_scaled.std(axis=0)array([1., 1., 1., 1., 1., 1., 1.])
# Perform the fit
kmeans.fit(amenities_nb_scaled)
# Extract the labels
amenities_nb_wide['label'] = kmeans.labels_(
alt.Chart(amenities_nb_wide)
.mark_circle()
.encode(
alt.X("restaurant:Q", scale=alt.Scale(zero=False)),
alt.Y("education:Q", scale=alt.Scale(zero=False)),
color=alt.Color("label:N", scale=alt.Scale(scheme="dark2")),
tooltip=list(amenities_nb_wide.columns),
)
.properties(width=800, height=600)
.interactive()
)amenities_nb_wide.groupby("label").size()label
0 49
1 29
2 48
3 220
4 25
dtype: int64
# Average restaurants per cluster
amenities_nb_wide.groupby("label")['restaurant'].mean().sort_values()label
2 8.320734
0 8.833224
1 12.488124
4 18.099593
3 50.925199
Name: restaurant, dtype: float64
# Average educational institutions per cluster
amenities_nb_wide.groupby("label")['parks'].mean().sort_values()label
2 0.614781
3 0.775837
1 0.852458
0 2.410477
4 19.769207
Name: parks, dtype: float64
amenities_clusters = amenities_nb_wide.merge(tract, how = "left", on = "nb_name")
amenities_clusters_gdf = gpd.GeoDataFrame(amenities_clusters, geometry = "geometry")amenities_clusters_gdf.explore(
column="label",
cmap="Set3",
categorical=True,
legend=True,
tiles="CartoDB positron"
)amenities_clusters_gdf.to_file('clusters.geojson', driver='GeoJSON', index=False)cluster_profile = amenities_nb_wide.groupby("label")[["arts", "education", "kids", "nightlife", "parks", "restaurant", "shopping"]].mean()
cluster_profile| type | arts | education | kids | nightlife | parks | restaurant | shopping |
|---|---|---|---|---|---|---|---|
| label | |||||||
| 0 | 10.122255 | 10.110204 | 4.389460 | 21.820754 | 2.410477 | 8.833224 | 7.430543 |
| 1 | 0.567573 | 42.981270 | 8.490453 | 2.605510 | 0.852458 | 12.488124 | 2.235311 |
| 2 | 0.905578 | 5.607784 | 2.897750 | 10.489589 | 0.614781 | 8.320734 | 31.978278 |
| 3 | 0.529042 | 4.282592 | 2.856462 | 5.174781 | 0.775837 | 50.925199 | 5.233768 |
| 4 | 0.508021 | 6.682243 | 25.187898 | 8.897605 | 19.769207 | 18.099593 | 1.166982 |
# Number of clusters to try out
n_clusters = list(range(2, 15))
# Run kmeans for each value of k
inertias = []
for k in n_clusters:
# Initialize and run
kmeans = KMeans(n_clusters=k, n_init=20)
kmeans.fit(amenities_nb_scaled)
# Save the "inertia"
inertias.append(kmeans.inertia_)
# Plot it!
plt.plot(n_clusters, inertias, marker='o', ms=10, mfc='white', lw=4, mew=3);
from kneed import KneeLocator
# Initialize the knee algorithm
kn = KneeLocator(n_clusters, inertias, curve='convex', direction='decreasing')
# Print out the knee
print(kn.knee)8

Restaurants
kmeans = KMeans(n_clusters=9, random_state=42, n_init=10)restaurant = food_desc[["name", "desc_1"]]
nb_gdf = amenities_neighborhood[["nb_name", "name", "rating"]]
restaurants_nb_gdf = restaurant.merge(nb_gdf, on = "name", how = "left")restaurants_nb_grouped = restaurants_nb_gdf.groupby(["nb_name", "desc_1"]).size().reset_index(name="count")
# Sort within each neighborhood group by count in descending order
restaurants_nb_grouped = restaurants_nb_grouped.sort_values(by=["nb_name", "count"], ascending=[True, False])
restaurants_nb_grouped.head()| nb_name | desc_1 | count | |
|---|---|---|---|
| 0 | Airport-Navy Yard 01 | italian | 2 |
| 1 | Airport-Navy Yard 01 | pizza | 1 |
| 2 | Airport-Navy Yard 01 | sandwiches/delis | 1 |
| 8 | Airport-Navy Yard 02 | fastfood | 46 |
| 4 | Airport-Navy Yard 02 | bakeries/cafes | 17 |
restaurants_nb_wide = restaurants_nb_grouped.pivot(index='nb_name', columns='desc_1', values='count').reset_index().fillna(0)
# Calculating the correlation matrix
correlation_matrix = restaurants_nb_wide.corr()
import altair as alt
# Reset index to convert the index into a column for Altair
heatmap_data = correlation_matrix.reset_index().melt('desc_1', var_name='desc_2', value_name='correlation')/var/folders/h_/pxhj6fqj4y3fqkpmr3g9dd3c0000gq/T/ipykernel_3626/2902381521.py:4: FutureWarning: The default value of numeric_only in DataFrame.corr is deprecated. In a future version, it will default to False. Select only valid columns or specify the value of numeric_only to silence this warning.
correlation_matrix = restaurants_nb_wide.corr()
# Calculating the total restaurants for each neighborhood
restaurants_total = restaurants_nb_wide.set_index('nb_name').sum(axis=1)
# Using 'map' to align and assign the total restaurants back to the original DataFrame
restaurants_nb_wide["total_restaurants"] = restaurants_nb_wide['nb_name'].map(restaurants_total)
restaurants_nb_wide.head()| desc_1 | nb_name | american | asianfusion | bakeries/cafes | bbq | breakfast_brunch | caribbean | chicken | chinese | cocktailbars | diner/grill | fastfood | halal | indpak | italian | japanese | korean | latin | mediterranean | pizza | salad | sandwiches | sandwiches/delis | seafood | soulfood | venues | vietnamese | total_restaurants |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Airport-Navy Yard 01 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 2.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 4.0 |
| 1 | Airport-Navy Yard 02 | 2.0 | 0.0 | 17.0 | 0.0 | 6.0 | 0.0 | 2.0 | 0.0 | 0.0 | 3.0 | 46.0 | 0.0 | 0.0 | 4.0 | 2.0 | 0.0 | 4.0 | 3.0 | 4.0 | 11.0 | 2.0 | 6.0 | 7.0 | 0.0 | 0.0 | 0.0 | 119.0 |
| 2 | Airport-Navy Yard 03 | 4.0 | 2.0 | 0.0 | 0.0 | 0.0 | 0.0 | 2.0 | 0.0 | 2.0 | 4.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 6.0 | 0.0 | 5.0 | 3.0 | 3.0 | 0.0 | 0.0 | 0.0 | 31.0 |
| 3 | Airport-Navy Yard 04 | 2.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 3.0 | 1.0 | 0.0 | 0.0 | 8.0 |
| 4 | Airport-Navy Yard 05 | 5.0 | 0.0 | 6.0 | 0.0 | 2.0 | 0.0 | 0.0 | 2.0 | 0.0 | 1.0 | 67.0 | 0.0 | 0.0 | 3.0 | 2.0 | 0.0 | 3.0 | 0.0 | 0.0 | 6.0 | 2.0 | 0.0 | 0.0 | 0.0 | 2.0 | 0.0 | 101.0 |
# Columns representing different cuisines
cuisine_columns = ['bakeries/cafes',
'chicken', 'chinese', 'diner/grill',
'fastfood', "korean",
'pizza',
'sandwiches/delis', 'vietnamese']
# Convert cuisine counts to percentages of the total restaurants
for column in cuisine_columns:
restaurants_nb_wide[column] = ((restaurants_nb_wide[column] / restaurants_nb_wide['total_restaurants']) * 100).round(2)
# Check the updated DataFrame
restaurants_nb_wide.head()| desc_1 | nb_name | american | asianfusion | bakeries/cafes | bbq | breakfast_brunch | caribbean | chicken | chinese | cocktailbars | diner/grill | fastfood | halal | indpak | italian | japanese | korean | latin | mediterranean | pizza | salad | sandwiches | sandwiches/delis | seafood | soulfood | venues | vietnamese | total_restaurants |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Airport-Navy Yard 01 | 0.0 | 0.0 | 0.00 | 0.0 | 0.0 | 0.0 | 0.00 | 0.00 | 0.0 | 0.00 | 0.00 | 0.0 | 0.0 | 2.0 | 0.0 | 0.0 | 0.0 | 0.0 | 25.00 | 0.0 | 0.0 | 25.00 | 0.0 | 0.0 | 0.0 | 0.0 | 4.0 |
| 1 | Airport-Navy Yard 02 | 2.0 | 0.0 | 14.29 | 0.0 | 6.0 | 0.0 | 1.68 | 0.00 | 0.0 | 2.52 | 38.66 | 0.0 | 0.0 | 4.0 | 2.0 | 0.0 | 4.0 | 3.0 | 3.36 | 11.0 | 2.0 | 5.04 | 7.0 | 0.0 | 0.0 | 0.0 | 119.0 |
| 2 | Airport-Navy Yard 03 | 4.0 | 2.0 | 0.00 | 0.0 | 0.0 | 0.0 | 6.45 | 0.00 | 2.0 | 12.90 | 0.00 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 19.35 | 0.0 | 5.0 | 9.68 | 3.0 | 0.0 | 0.0 | 0.0 | 31.0 |
| 3 | Airport-Navy Yard 04 | 2.0 | 0.0 | 12.50 | 0.0 | 0.0 | 0.0 | 0.00 | 0.00 | 0.0 | 12.50 | 0.00 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.00 | 0.0 | 0.0 | 0.00 | 3.0 | 1.0 | 0.0 | 0.0 | 8.0 |
| 4 | Airport-Navy Yard 05 | 5.0 | 0.0 | 5.94 | 0.0 | 2.0 | 0.0 | 0.00 | 1.98 | 0.0 | 0.99 | 66.34 | 0.0 | 0.0 | 3.0 | 2.0 | 0.0 | 3.0 | 0.0 | 0.00 | 6.0 | 2.0 | 0.00 | 0.0 | 0.0 | 2.0 | 0.0 | 101.0 |
# Create the heatmap
heatmap = alt.Chart(heatmap_data).mark_rect().encode(
x='desc_1:N',
y='desc_2:N',
color='correlation:Q'
)
# Add text to each cell
text = heatmap.mark_text(baseline='middle').encode(
text=alt.Text('correlation:Q', format='.2f'),
color=alt.condition(
alt.datum.correlation > 0.5,
alt.value('white'),
alt.value('black')
)
)
# Display the chart
chart = (heatmap + text).properties(width=1000, height=1000)
chartrestaurants_nb_scaled = scaler.fit_transform(restaurants_nb_wide[['bakeries/cafes',
'chicken', 'chinese', 'diner/grill',
'fastfood', "korean",
'pizza',
'sandwiches/delis', 'vietnamese']])restaurants_nb_scaled[:5]
restaurants_nb_scaled.std(axis=0)array([1., 1., 1., 1., 1., 1., 1., 1., 1.])
restaurants_nb_scaled.mean(axis=0)array([-1.20970537e-16, -1.43254584e-17, 1.33704278e-16, -3.34260696e-17,
-5.73018335e-17, 1.27337408e-17, 0.00000000e+00, -6.84438567e-17,
-6.36687039e-18])
# Perform the fit
kmeans.fit(restaurants_nb_scaled)
# Extract the labels
restaurants_nb_wide['label'] = kmeans.labels_(
alt.Chart(restaurants_nb_wide)
.mark_circle()
.encode(
alt.X("bakeries/cafes:Q", scale=alt.Scale(zero=False)),
alt.Y("chinese:Q", scale=alt.Scale(zero=False)),
color=alt.Color("label:N", scale=alt.Scale(scheme="dark2")),
tooltip=list(restaurants_nb_wide.columns),
)
.properties(width=800, height=600)
.interactive()
)restaurants_nb_wide.groupby("label").size()label
0 81
1 26
2 18
3 97
4 26
5 1
6 1
7 22
8 7
dtype: int64
# Average restaurants per cluster
restaurants_nb_wide.groupby("label")['bakeries/cafes'].mean().sort_values()label
5 0.000000
6 0.000000
1 0.349615
8 0.475714
0 3.369012
4 7.684231
7 7.992273
3 8.589588
2 72.334444
Name: bakeries/cafes, dtype: float64
restaurants_clusters = restaurants_nb_wide.merge(tract, how = "left", on = "nb_name")
restaurants_clusters_gdf = gpd.GeoDataFrame(restaurants_clusters, geometry = "geometry")restaurants_clusters_gdf.explore(
column="label",
cmap="Set3",
categorical=True,
legend=True,
tiles="CartoDB positron"
)food_profile = restaurants_nb_wide.groupby("label")[['bakeries/cafes',
'chicken', 'chinese', 'diner/grill',
'fastfood', "korean",
'pizza',
'sandwiches/delis', 'vietnamese']].mean().round()
food_profile| desc_1 | bakeries/cafes | chicken | chinese | diner/grill | fastfood | korean | pizza | sandwiches/delis | vietnamese |
|---|---|---|---|---|---|---|---|---|---|
| label | |||||||||
| 0 | 3.0 | 2.0 | 3.0 | 1.0 | 71.0 | 0.0 | 4.0 | 3.0 | 0.0 |
| 1 | 0.0 | 0.0 | 4.0 | 1.0 | 0.0 | 1.0 | 76.0 | 5.0 | 0.0 |
| 2 | 72.0 | 1.0 | 3.0 | 0.0 | 0.0 | 0.0 | 12.0 | 4.0 | 0.0 |
| 3 | 9.0 | 1.0 | 4.0 | 1.0 | 1.0 | 0.0 | 11.0 | 25.0 | 0.0 |
| 4 | 8.0 | 1.0 | 50.0 | 0.0 | 1.0 | 0.0 | 14.0 | 7.0 | 0.0 |
| 5 | 0.0 | 0.0 | 33.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 67.0 |
| 6 | 0.0 | 0.0 | 33.0 | 0.0 | 0.0 | 33.0 | 33.0 | 0.0 | 0.0 |
| 7 | 8.0 | 6.0 | 5.0 | 22.0 | 0.0 | 0.0 | 9.0 | 8.0 | 0.0 |
| 8 | 0.0 | 50.0 | 11.0 | 1.0 | 7.0 | 0.0 | 10.0 | 7.0 | 0.0 |
restaurants_clusters_gdf.to_file('food_clusters.geojson', driver='GeoJSON', index=False)# Number of clusters to try out
n_clusters = list(range(2, 15))
# Run kmeans for each value of k
inertias = []
for k in n_clusters:
# Initialize and run
kmeans = KMeans(n_clusters=k, n_init=10)
kmeans.fit(restaurants_nb_scaled)
# Save the "inertia"
inertias.append(kmeans.inertia_)
# Plot it!
plt.plot(n_clusters, inertias, marker='o', ms=10, mfc='white', lw=4, mew=3);
from kneed import KneeLocator
# Initialize the knee algorithm
kn = KneeLocator(n_clusters, inertias, curve='convex', direction='decreasing')
# Print out the knee
print(kn.knee)9

Nightlife
nightlife = night_desc[["name", "desc_1"]]
nb_gdf = amenities_neighborhood[["neighborhood_name", "name", "rating"]]
nightlife_nb_gdf = nightlife.merge(nb_gdf, on = "name", how = "left")nightlife_nb_grouped = nightlife_nb_gdf.groupby(["neighborhood_name", "desc_1"]).size().reset_index(name="count")
# Sort within each neighborhood group by count in descending order
nightlife_nb_grouped = nightlife_nb_grouped.sort_values(by=["neighborhood_name", "count"], ascending=[True, False])
nightlife_nb_grouped.head(20)| neighborhood_name | desc_1 | count | |
|---|---|---|---|
| 5 | Airport-Navy Yard | sportsbars | 21 |
| 6 | Airport-Navy Yard | wine_bars | 14 |
| 0 | Airport-Navy Yard | bars | 13 |
| 2 | Airport-Navy Yard | cocktailbars | 5 |
| 1 | Airport-Navy Yard | beerbar | 2 |
| 4 | Airport-Navy Yard | pubs | 2 |
| 3 | Airport-Navy Yard | lounges | 1 |
| 9 | Bustleton | sportsbars | 14 |
| 7 | Bustleton | bars | 2 |
| 8 | Bustleton | cocktailbars | 2 |
| 10 | Center City East | bars | 42 |
| 12 | Center City East | cocktailbars | 28 |
| 13 | Center City East | lounges | 13 |
| 14 | Center City East | musicvenues | 12 |
| 15 | Center City East | pubs | 10 |
| 16 | Center City East | sportsbars | 10 |
| 17 | Center City East | wine_bars | 9 |
| 11 | Center City East | beerbar | 5 |
| 18 | Center City West | bars | 28 |
| 20 | Center City West | cocktailbars | 19 |
nightlife_nb_wide = nightlife_nb_grouped.pivot(index='neighborhood_name', columns='desc_1', values='count').fillna(0).reset_index()nightlife_nb_scaled = scaler.fit_transform(nightlife_nb_wide[['bars', 'beerbar', 'cocktailbars', 'divebars', 'lounges', 'musicvenues', 'pubs',
'sportsbars', 'wine_bars']])nightlife_nb_scaled[:5]
nightlife_nb_scaled.std(axis=0)array([1., 1., 1., 1., 1., 1., 1., 1., 1.])
nightlife_nb_scaled.mean(axis=0)array([ 3.70074342e-17, 9.25185854e-18, 0.00000000e+00, -3.70074342e-17,
-1.85037171e-17, -1.85037171e-17, 1.85037171e-17, -5.55111512e-17,
3.70074342e-17])
# Perform the fit
kmeans.fit(nightlife_nb_scaled)
# Extract the labels
nightlife_nb_wide['label'] = kmeans.labels_(
alt.Chart(nightlife_nb_wide)
.mark_circle()
.encode(
alt.X("bars:Q", scale=alt.Scale(zero=False)),
alt.Y("musicvenues:Q", scale=alt.Scale(zero=False)),
color=alt.Color("label:N", scale=alt.Scale(scheme="dark2")),
tooltip=list(nightlife_nb_wide.columns),
)
.properties(width=800, height=600)
.interactive()
)nightlife_nb_wide.groupby("label").size()label
0 35
1 5
2 1
3 5
4 2
dtype: int64
nightlife_clusters = nightlife_nb_wide.merge(tract, how = "left", on = "neighborhood_name")
nightlife_clusters_gdf = gpd.GeoDataFrame(nightlife_clusters, geometry = "geometry")nightlife_clusters_gdf.explore(
column="label",
cmap="Set3",
categorical=True,
legend=True,
tiles="CartoDB positron"
)District profiles
clusters = gpd.read_file("data/clusters.geojson")
districts = gpd.read_file("data/Planning_Districts.geojson")[["DIST_NAME", "geometry"]]
districts_proj = districts.to_crs(2272)
cluster_ctr = clusters.copy()
cluster_ctr.geometry = cluster_ctr.geometry.centroidtract_dist = cluster_ctr.sjoin(districts_proj)district_profile = tract_dist.groupby("DIST_NAME")[['arts', 'beauty', 'education', 'entertainment', 'grocery',
'healthcare', 'historic', 'kids', 'nightlife', 'parks',
'restaurant', 'shopping']].mean()
district_profile.to_csv('district_profile.csv', index=False)
district_profile| arts | beauty | education | entertainment | grocery | healthcare | historic | kids | nightlife | parks | restaurant | shopping | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| DIST_NAME | ||||||||||||
| Central | 5.971490 | 10.281469 | 9.449355 | 10.740264 | 8.900751 | 0.360445 | 1.637355 | 5.939139 | 14.530610 | 4.821984 | 16.498055 | 10.869083 |
| Central Northeast | 0.540962 | 25.421063 | 4.728159 | 0.540962 | 4.113855 | 0.000000 | 0.000000 | 10.192575 | 4.474206 | 3.506944 | 33.184762 | 13.296511 |
| Lower Far Northeast | 0.461133 | 24.714222 | 14.267947 | 2.071256 | 3.328687 | 0.000000 | 0.000000 | 10.617565 | 7.457475 | 0.000000 | 29.417666 | 7.664049 |
| Lower North | 1.301211 | 13.859954 | 6.465486 | 4.081751 | 8.906960 | 0.000000 | 0.763305 | 2.175683 | 5.643713 | 1.702007 | 47.696623 | 7.403306 |
| Lower Northeast | 1.315789 | 21.832727 | 13.772161 | 1.621478 | 11.047101 | 0.000000 | 0.000000 | 4.521085 | 3.503324 | 0.000000 | 35.050175 | 7.336160 |
| Lower Northwest | 0.359902 | 13.301367 | 13.765191 | 2.918120 | 6.960742 | 0.089127 | 0.000000 | 11.537796 | 10.622454 | 7.661078 | 27.755092 | 5.029132 |
| Lower South | 0.910295 | 3.846154 | 1.956815 | 13.644446 | 1.315789 | 0.000000 | 0.000000 | 10.312004 | 12.756644 | 1.923077 | 48.105357 | 5.229420 |
| Lower Southwest | 0.000000 | 13.670149 | 1.217391 | 0.500000 | 4.281950 | 0.000000 | 0.000000 | 0.909091 | 4.568511 | 5.000000 | 65.396386 | 4.456522 |
| North | 0.265252 | 22.470343 | 6.713256 | 1.144556 | 4.142242 | 0.000000 | 0.000000 | 1.706497 | 8.540435 | 0.662162 | 39.768439 | 14.586818 |
| North Delaware | 1.353491 | 19.597775 | 3.998782 | 2.436825 | 5.255671 | 0.339080 | 0.000000 | 9.639178 | 9.953720 | 3.217949 | 38.253892 | 5.953636 |
| River Wards | 1.992442 | 17.306946 | 6.730324 | 5.083619 | 7.024615 | 0.000000 | 0.000000 | 3.097565 | 11.344671 | 0.860001 | 35.939342 | 10.620475 |
| South | 3.374176 | 15.880016 | 5.555589 | 6.464443 | 16.421297 | 0.128205 | 0.072464 | 4.316686 | 13.612930 | 3.406275 | 23.779671 | 6.988249 |
| University Southwest | 2.290731 | 12.430496 | 12.013138 | 7.345937 | 4.753251 | 0.441176 | 2.500000 | 5.935079 | 8.711911 | 2.033190 | 34.115762 | 7.429327 |
| Upper Far Northeast | 2.757153 | 24.947634 | 11.107189 | 6.078872 | 4.107238 | 0.000000 | 0.000000 | 3.549484 | 8.009225 | 2.721088 | 29.963477 | 6.758640 |
| Upper North | 0.603448 | 22.641249 | 9.716666 | 1.808242 | 6.328772 | 0.000000 | 0.000000 | 0.835392 | 2.162648 | 0.000000 | 49.953789 | 5.949795 |
| Upper Northwest | 2.829100 | 21.347420 | 8.959644 | 4.127680 | 7.336850 | 0.631103 | 1.577909 | 4.020826 | 2.262047 | 0.603468 | 35.234163 | 11.069790 |
| West | 0.063131 | 36.707531 | 8.104885 | 0.914465 | 7.138201 | 0.000000 | 0.000000 | 4.999307 | 7.647183 | 1.850168 | 29.028258 | 3.546871 |
| West Park | 1.573720 | 13.094920 | 6.222580 | 9.617938 | 5.851341 | 0.000000 | 0.000000 | 4.660455 | 9.496647 | 1.428571 | 34.637795 | 13.416033 |
| type | arts | beauty | education | entertainment | grocery | healthcare | historic | kids | nightlife | parks | restaurant | shopping |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| label | ||||||||||||
| 0 | 1.029109 | 35.168462 | 5.336747 | 3.139291 | 6.920761 | 0.318833 | 0.152848 | 3.986580 | 8.459712 | 0.830587 | 19.255118 | 15.401953 |
| 1 | 9.011412 | 6.799441 | 11.747031 | 17.897042 | 6.421039 | 0.079158 | 0.880370 | 3.914400 | 18.284042 | 2.420829 | 11.698266 | 10.846969 |
| 2 | 0.577297 | 5.976170 | 7.180234 | 5.102304 | 3.245692 | 0.505051 | 3.044482 | 25.018387 | 9.605864 | 21.546735 | 16.871668 | 1.326115 |
| 3 | 0.371984 | 16.328841 | 5.427419 | 1.220435 | 4.626384 | 0.009956 | 0.292610 | 2.990855 | 4.661643 | 0.793312 | 59.421667 | 3.854895 |
| 4 | 0.571429 | 3.822222 | 34.893651 | 0.000000 | 31.874603 | 0.000000 | 0.000000 | 7.266667 | 5.666667 | 0.571429 | 5.666667 | 9.666667 |